MIT Sloan Health Systems Initiative

Vivek Farias: Causal Inference in Large-Scale Observational Omics Data

Professor Farias's work focuses on creating methods to optimize large data sets that have areas of sparse data. His methods allow for making inferences and proving they are valid. This problem of optimizing sparse large data sets and subsequently making appropriate inferences has stumped researchers. This technique is important in, for example, using AI in drug development.

HSI funded some of his prior work in this area, and his platform has already shown promise as a tool to generate proteomic biomarkers at scale for both non-small-cell lung cancer (NSCLC) and Alzheimer’s disease. His work enables more accurate and new conclusions that could not be solved otherwise. 

His newest framework, which builds on the previously funded research, shows promise for creating biologically testable hypotheses that aid in developing new therapies, repurposing drugs, and making some new diagnostic assays more reliable. His focus is on observational data related to proteomics, that is the study of proteins in a cell with the ultimate goal of diagnosing illness and discovering new therapeutics.

Farias’s methods enable researchers to pinpoint a subset of data they are able to parse that is representative of the larger complex set, thereby giving the researchers an understanding of the entire data set from this specifically chosen sample. The method additionally offers indications that will lead to approaches to understand causal mechanisms.

There are both theoretical and practical successes from this work. The team’s theoretical work on causal analysis was recently accepted by Neural Information Processing Systems (NeurIPS) for their upcoming conference, one of only 55 papers accepted for full presentation out of more than 10,000 submissions.

On the practical side, the team, in collaboration with the Broad Institute tested their methods on a large-scale data set and successfully demonstrated its power to overcome some of the limitations in collecting proteomic data.

Farias and his team have recently applied for funding for a follow-on project under the auspices of the MIT-Takeda Alliance.